Goto

Collaborating Authors

 Derbyshire


Automated Boilerplate: Prevalence and Quality of Contract Generators in the Context of Swiss Privacy Policies

Nenadic, Luka, Rodriguez, David

arXiv.org Artificial Intelligence

It has become increasingly challenging for firms to comply with a plethora of novel digital regulations. This is especially true for smaller businesses that often lack both the resources and know-how to draft complex legal documents. Instead of seeking costly legal advice from attorneys, firms may turn to cheaper alternative legal service providers such as automated contract generators. While these services have a long-standing presence, there is little empirical evidence on their prevalence and output quality. We address this gap in the context of a 2023 Swiss privacy law revision. To enable a systematic evaluation, we create and annotate a multilingual benchmark dataset that captures key compliance obligations under Swiss and EU privacy law. Using this dataset, we validate a novel GPT-5-based method for large-scale compliance assessment of privacy policies, allowing us to measure the impact of the revision. We observe compliance increases indicating an effect of the revision. Generators, explicitly referenced by 18% of local websites, are associated with substantially higher levels of compliance, with increases of up to 15 percentage points compared to privacy policies without generator use. These findings contribute to three debates: the potential of LLMs for cross-lingual legal analysis, the Brussels Effect of EU regulations, and, crucially, the role of automated tools in improving compliance and contractual quality.


SaRoHead: Detecting Satire in a Multi-Domain Romanian News Headline Dataset

Vîrlan, Mihnea-Alexandru, Smădu, Răzvan-Alexandru, Cercel, Dumitru-Clementin, Pop, Florin, Cercel, Mihaela-Claudia

arXiv.org Artificial Intelligence

The primary goal of a news headline is to summarize an event in as few words as possible. Depending on the media outlet, a headline can serve as a means to objectively deliver a summary or improve its visibility. For the latter, specific publications may employ stylistic approaches that incorporate the use of sarcasm, irony, and exaggeration, key elements of a satirical approach. As such, even the headline must reflect the tone of the satirical main content. Current approaches for the Romanian language tend to detect the non-conventional tone (i.e., satire and clickbait) of the news content by combining both the main article and the headline. Because we consider a headline to be merely a brief summary of the main article, we investigate in this paper the presence of satirical tone in headlines alone, testing multiple baselines ranging from standard machine learning algorithms to deep learning models. Our experiments show that Bidirectional Transformer models outperform both standard machine-learning approaches and Large Language Models (LLMs), particularly when the meta-learning Reptile approach is employed.


Toward Dependency Dynamics in Multi-Agent Reinforcement Learning for Traffic Signal Control

Zhang, Yuli, Wang, Shangbo, Jia, Dongyao, Fan, Pengfei, Jiang, Ruiyuan, Gu, Hankang, Chow, Andy H. F.

arXiv.org Artificial Intelligence

Reinforcement learning (RL) emerges as a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, with deep neural networks substantially augmenting its learning capabilities. However, centralized RL becomes impractical for ATSC involving multiple agents due to the exceedingly high dimensionality of the joint action space. Multi-agent RL (MARL) mitigates this scalability issue by decentralizing control to local RL agents. Nevertheless, this decentralized method introduces new challenges: the environment becomes partially observable from the perspective of each local agent due to constrained inter-agent communication. Both centralized RL and MARL exhibit distinct strengths and weaknesses, particularly under heavy intersectional traffic conditions. In this paper, we justify that MARL can achieve the optimal global Q-value by separating into multiple IRL (Independent Reinforcement Learning) processes when no spill-back congestion occurs (no agent dependency) among agents (intersections). In the presence of spill-back congestion (with agent dependency), the maximum global Q-value can be achieved by using centralized RL. Building upon the conclusions, we propose a novel Dynamic Parameter Update Strategy for Deep Q-Network (DQN-DPUS), which updates the weights and bias based on the dependency dynamics among agents, i.e. updating only the diagonal sub-matrices for the scenario without spill-back congestion. We validate the DQN-DPUS in a simple network with two intersections under varying traffic, and show that the proposed strategy can speed up the convergence rate without sacrificing optimal exploration. The results corroborate our theoretical findings, demonstrating the efficacy of DQN-DPUS in optimizing traffic signal control.


A Global Cybersecurity Standardization Framework for Healthcare Informatics

Gupta, Kishu, Mishra, Vinaytosh, Makkar, Aaisha

arXiv.org Artificial Intelligence

Healthcare has witnessed an increased digitalization in the post-COVID world. Technologies such as the medical internet of things and wearable devices are generating a plethora of data available on the cloud anytime from anywhere. This data can be analyzed using advanced artificial intelligence techniques for diagnosis, prognosis, or even treatment of disease. This advancement comes with a major risk to protecting and securing protected health information (PHI). The prevailing regulations for preserving PHI are neither comprehensive nor easy to implement. The study first identifies twenty activities crucial for privacy and security, then categorizes them into five homogeneous categories namely: $\complement_1$ (Policy and Compliance Management), $\complement_2$ (Employee Training and Awareness), $\complement_3$ (Data Protection and Privacy Control), $\complement_4$ (Monitoring and Response), and $\complement_5$ (Technology and Infrastructure Security) and prioritizes these categories to provide a framework for the implementation of privacy and security in a wise manner. The framework utilized the Delphi Method to identify activities, criteria for categorization, and prioritization. Categorization is based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and prioritization is performed using a Technique for Order of Preference by Similarity to the Ideal Solution (TOPSIS). The outcomes conclude that $\complement_3$ activities should be given first preference in implementation and followed by $\complement_1$ and $\complement_2$ activities. Finally, $\complement_4$ and $\complement_5$ should be implemented. The prioritized view of identified clustered healthcare activities related to security and privacy, are useful for healthcare policymakers and healthcare informatics professionals.


Benchmarking of LLM Detection: Comparing Two Competing Approaches

Pröhl, Thorsten, Putzier, Erik, Zarnekow, Rüdiger

arXiv.org Artificial Intelligence

This article gives an overview of the field of LLM text recognition. Different approaches and implemented detectors for the recognition of LLM-generated text are presented. In addition to discussing the implementations, the article focuses on benchmarking the detectors. Although there are numerous software products for the recognition of LLM-generated text, with a focus on ChatGPT-like LLMs, the quality of the recognition (recognition rate) is not clear. Furthermore, while it can be seen that scientific contributions presenting their novel approaches strive for some kind of comparison with other approaches, the construction and independence of the evaluation dataset is often not comprehensible. As a result, discrepancies in the performance evaluation of LLM detectors are often visible due to the different benchmarking datasets. This article describes the creation of an evaluation dataset and uses this dataset to investigate the different detectors. The selected detectors are benchmarked against each other.


Grounding Toxicity in Real-World Events across Languages

Tufa, Wondimagegnhue Tsegaye, Markov, Ilia, Vossen, Piek

arXiv.org Artificial Intelligence

Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code.


UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models

Kapsalis, Timo

arXiv.org Artificial Intelligence

In contemporary design practices, the integration of computer vision and generative artificial intelligence (genAI) represents a transformative shift towards more interactive and inclusive processes. These technologies offer new dimensions of image analysis and generation, which are particularly relevant in the context of urban landscape reconstruction. This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design. Our methodology encompasses the OneFormer model for detailed image segmentation and the Stable Diffusion XL (SDXL) diffusion model, implemented through ControlNet, for generating images from textual descriptions. Validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation. This was evidenced by superior Intersection over Union (IoU) and CLIP scores across iterative evaluations for various categories of urban landscape features. Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning. Early results suggested that UrbanGenAI not only advances the technical frontiers of urban landscape reconstruction but also provides significant pedagogical and participatory planning benefits. The ongoing development of UrbanGenAI aims to further validate its effectiveness across broader contexts and integrate additional features such as real-time feedback mechanisms and 3D modelling capabilities. Keywords: generative AI; panoptic image segmentation; diffusion models; urban landscape design; design pedagogy; co-design


CADgpt: Harnessing Natural Language Processing for 3D Modelling to Enhance Computer-Aided Design Workflows

Kapsalis, Timo

arXiv.org Artificial Intelligence

This paper introduces CADgpt, an innovative plugin integrating Natural Language Processing (NLP) with Rhino3D for enhancing 3D modelling in computer-aided design (CAD) environments. Leveraging OpenAI's GPT-4, CADgpt simplifies the CAD interface, enabling users, particularly beginners, to perform complex 3D modelling tasks through intuitive natural language commands. This approach significantly reduces the learning curve associated with traditional CAD software, fostering a more inclusive and engaging educational environment. The paper discusses CADgpt's technical architecture, including its integration within Rhino3D and the adaptation of GPT-4 capabilities for CAD tasks. It presents case studies demonstrating CADgpt's efficacy in various design scenarios, highlighting its potential to democratise design education by making sophisticated design tools accessible to a broader range of students. The discussion further explores CADgpt's implications for pedagogy and curriculum development, emphasising its role in enhancing creative exploration and conceptual thinking in design education. Keywords: Natural Language Processing, Computer-Aided Design, 3D Modelling, Design Automation, Design Education, Architectural Education


Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets

Subhani, Moeez M., Anjum, Ashiq

arXiv.org Machine Learning

Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73\% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.


AI Iot - How AI Will Take Industrial Internet of Things to New Heights

#artificialintelligence

Before we delve deep into this subject, let's hear what expert-level research has to say about both of these technologies: We can keep on going with the many more remarkable statistics about AI and IoT, but these ones should be enough for now. The re-emergence of the decades-old technological ideas like Artificial Intelligence and the Internet of Things, at the right time and right place, has suddenly disrupted the traditional industrial norms – for the better this time. It has kickstarted a digital revolution that was only possible way back in the science fiction writings of H.G. Wells, Jules Verne, Arthur Conan Doyle, or other masterminds of Sci-Fis. It has ushered the classical Industrial Revolution of the 18th century into the Industry 4.0 of the 21st. The early proponents and experts of both technologies were simply ecstatic about the outstanding transformational possibilities a union between AI and IoT could produce.